replace SimpleNamespace configs with typed dataclasses and enums#24
replace SimpleNamespace configs with typed dataclasses and enums#24matthewcornell wants to merge 4 commits intomainfrom
Conversation
…ate pyproject.toml to pin pandas to 2.* (pandas 3.* introduced breaking changes). update uv.lock to use latest iddata commit
lshandross
left a comment
There was a problem hiding this comment.
I had a few comments, questions, and places where I think we should get Nick's input, but this looks so much better, Matt!
| class Disease(str, Enum): | ||
| FLU = "flu" | ||
| COVID = "covid" | ||
|
|
|
|
||
| class PowerTransform(str, Enum): | ||
| FOURTH_ROOT = "4rt" | ||
|
|
There was a problem hiding this comment.
Agreed, we should have a NONE option as well.
| theta_pooling: PoolingStrategy = PoolingStrategy.NONE | ||
| sigma_pooling: PoolingStrategy = PoolingStrategy.NONE | ||
| x: list = field(default_factory=list) | ||
|
|
There was a problem hiding this comment.
I think we should check with @nickreich if these are indeed the defaults we want for the SARIX model class parameters (and we should do the same for the GBQR model class)
There was a problem hiding this comment.
I don't feel that strongly about what the defaults are/should be. Defaulting to NONE seems reasonable to me.
src/idmodels/config.py
Outdated
|
|
||
|
|
||
| @dataclass | ||
| class SARIXRunConfig(RunConfig): |
There was a problem hiding this comment.
We discussed whether it made sense to move these parameters to the model class and decided on getting @nickreich 's opinion. I think it could make sense to eliminate SARIX and GBQR run config classes if possible since they store only a few parameters unique to the particular model
There was a problem hiding this comment.
Agreed! We could have documentation on some of the enums within the classes for users?
| date = datetime.date.fromisoformat("2024-01-06") | ||
| fips_codes = ["US", "01", "02", "04", "05"] # fewer locs for faster testing | ||
| # model_config = create_test_sarix_model_config(main_source=["nhsn"], theta_pooling="shared", sigma_pooling="none") | ||
| # model_config = create_test_sarix_model_config(main_source=[DataSource.NHSN], theta_pooling="shared", sigma_pooling="none") |
There was a problem hiding this comment.
I think that I had left this commented code in by mistake. We can get rid of it
| date = datetime.date.fromisoformat("2024-01-06") | ||
| fips_codes = ["US", "01", "02", "04", "05"] # fewer locs for faster testing | ||
| # model_config = create_test_sarix_model_config(main_source=["nhsn"], theta_pooling="shared", sigma_pooling="none") | ||
| # model_config = create_test_sarix_model_config(main_source=[DataSource.NHSN], theta_pooling="shared", sigma_pooling="none") |
There was a problem hiding this comment.
Again, I think we can get rid of this commented code
There was a problem hiding this comment.
I like this! One thing to keep in mind is that if model development goes through many iterations, we may end up having a lot of enums in the configs. I think we should keep some kind of a doc for their purpose and intention. E.g. what does "Pooling" mean to a user of the model (even if it may be obvious).
trobacker
left a comment
There was a problem hiding this comment.
This is great! I left a comment about documenting the enums for users of the code and supported some of @lshandross's comments. I'll leave approval for one more by Nick for now.
| theta_pooling: PoolingStrategy = PoolingStrategy.NONE | ||
| sigma_pooling: PoolingStrategy = PoolingStrategy.NONE | ||
| x: list = field(default_factory=list) | ||
|
|
There was a problem hiding this comment.
I don't feel that strongly about what the defaults are/should be. Defaulting to NONE seems reasonable to me.
|
Re: whether to keep I think the inference/fitting parameters (
On formalizing @dataclass
class SARIXModelConfig(ModelConfig):
# ... structural params ...
num_warmup: int = 2000
num_samples: int = 2000
num_chains: int = 1
def short_run(self):
"""Return a copy with reduced inference intensity for testing."""
return dataclasses.replace(self, num_warmup=100, num_samples=100)
@dataclass
class GBQRModelConfig(ModelConfig):
# ... structural params ...
num_bags: int = 100
bag_frac_samples: float = 0.7
def short_run(self):
return dataclasses.replace(self, num_bags=10)This way each model defines what "short run" means for itself, it's discoverable/documented, and the caller just does |
|
The above was drafted (obviously, I think) with Claude. But with substantial steering from me, especially towards the part of thinking of the short_run flag as part of how a model is defined formally. |
|
Thanks, @nickreich ! I've integrated your suggestions into the plan at this comment. |
|
@nickreich Re: whether to keep RunConfig subclasses or move inference params to ModelConfig:
|
|
I mean, I think we need to move it to the |
@lshandross , @trobacker What are your thoughts about |
|
Another suggestion would be to make it an optional argument (I don't know if that's the correct terminology) in the general run config that only gets used if the model is a GBQR. But I don't know if I have a strong preference between my suggestion and Nick's |
…o corresponding *ModelConfig classes. made RunConfig concrete. added Disease.RSV, PowerTransform.NONE, updated gbqr.py to handle Disease.RSV and PowerTransform.NONE
|
I've updated the code to include suggested changes. Please review d74fe38 . |
lshandross
left a comment
There was a problem hiding this comment.
I have one comment but it's not blocking. Thank you for making the changes, Matt!
|
|
||
| def create_test_sarix_run_config(ref_date, states, hsas, num, tmp_path): | ||
| run_config = SARIXRunConfig( | ||
| def create_test_sarix_run_config(ref_date, states, hsas, tmp_path): |
There was a problem hiding this comment.
I wonder if this helper function should be taken out of this script and moved into a separate one (the ones to create sarix and gbqr model configs could also be moved here), as well as renamed to be just create_test_run_config which can be reused by both sarix and gbqr models. If others agree, then maybe we can file an issue for it since it doesn't really belong in this PR
This PR is part of Operational models refactoring ideas #42. It is paired with the operational-models PR [replace SimpleNamespace configs with typed dataclasses and enums #43].
This PR replaces
SimpleNamespaceconfigs with typed dataclasses and enums. It is the first step in Operational models refactoring ideas #42. Specifically we introduce a dedicated config.py module with concrete, typed classes replacing ad-hocSimpleNamespaceobjects used for model and run configuration. Key changes:DataSource,Disease,PowerTransform, andPoolingStrategyenumsModelConfigandRunConfigdataclasses with concrete subclasses:SARIXModelConfig,SARIXFourierModelConfig,GBQRModelConfig,SARIXRunConfig, andGBQRRunConfig__init__.pySimpleNamespaceWe also: